众所周知,图形神经网络(GNN)的成功高度依赖于丰富的人类通知数据,这在实践中努力获得,并且并非总是可用的。当只有少数标记的节点可用时,如何开发高效的GNN仍在研究。尽管已证明自我训练对于半监督学习具有强大的功能,但其在图形结构数据上的应用可能会失败,因为(1)不利用较大的接收场来捕获远程节点相互作用,这加剧了传播功能的难度 - 标记节点到未标记节点的标签模式; (2)有限的标记数据使得在不同节点类别中学习良好的分离决策边界而不明确捕获基本的语义结构,这是一项挑战。为了解决捕获信息丰富的结构和语义知识的挑战,我们提出了一个新的图数据增强框架,AGST(增强图自训练),该框架由两个新的(即结构和语义)增强模块构建。 GST骨干。在这项工作中,我们研究了这个新颖的框架是否可以学习具有极有限标记节点的有效图预测模型。在有限标记节点数据的不同情况下,我们对半监督节点分类进行全面评估。实验结果证明了新的数据增强框架对节点分类的独特贡献,几乎没有标记的数据。
translated by 谷歌翻译
现有关于异常检测的作品(AD)依赖于人类注释者的清洁标签,这些标签在实践中获取昂贵。在这项工作中,我们提出了一种方法来利用弱/嘈杂标签(例如,由机器规则生成的检测恶意软件产生的风险评分),可获得异常检测的便宜。具体来说,我们建议Axpoe,这是从嘈杂标签中学习的异常检测算法的第一个框架。简而言之,Axpoe利用了专家(MOE)架构的混合物来鼓励从多个嘈杂来源的专门和可扩展的学习。它通过共享大多数模型参数来捕获嘈杂标签之间的相似性,同时通过构建“专家”子网络来鼓励专业化。为了进一步从嘈杂的标签中榨出信号,Ampoe将其用作输入功能来促进专家学习。在八个数据集(包括专有企业安全数据集)上进行了广泛的结果,证明了AXPOE的有效性,在该数据集中,它在不使用它的情况下可以提高34%的性能改进。此外,它的表现超过了13个带有等效网络参数和失败的领先基线。值得注意的是,AXPOE是模型不可替代的,可以实现任何基于神经网络的检测方法来处理嘈杂的标签,在此我们在多层感知器(MLP)和领先的AD方法Deepsad上都展示了其结果。
translated by 谷歌翻译
最近的自然语言理解进展(NLU)已经被驱动,部分是由胶水,超级格,小队等的基准。事实上,许多NLU模型现在在许多任务中匹配或超过“人类水平”性能这些基准。然而,大多数这些基准测试都提供模型访问相对大量的标记数据进行培训。因此,该模型提供了比人类所需的更多数据,以实现强大的性能。这有动机侧重于侧重于改善NLU模型的少量学习性能。然而,缺乏少量射门的标准化评估基准,导致不同纸张中的不同实验设置。为了帮助加速这一工作的工作,我们介绍了线索(受限制的语言理解评估标准),这是评估NLU模型的几次拍摄学习功能的基准。我们证明,虽然最近的模型在获得大量标记数据时达到人类性能,但对于大多数任务,少量拍摄设置中的性能存在巨大差距。我们还展示了几个拍摄设置中替代模型家族和适应技术之间的差异。最后,我们讨论了在设计实验设置时讨论了评估真实少量学习绩效的实验设置,并提出了统一的标准化方法,以获得少量学习评估。我们的目标是鼓励对NLU模型的研究,可以概括为具有少数示例的新任务。线索的代码和数据可以在https://github.com/microsoft/clues提供。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译